Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
Mais filtros










Intervalo de ano de publicação
1.
Front Artif Intell ; 7: 1287877, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38405218

RESUMO

This study assessed the influence of speaker similarity and sample length on the performance of an automatic speaker recognition (ASR) system utilizing the SpeechBrain toolkit. The dataset comprised recordings from 20 male identical twin speakers engaged in spontaneous dialogues and interviews. Performance evaluations involved comparing identical twins, all speakers in the dataset (including twin pairs), and all speakers excluding twin pairs. Speech samples, ranging from 5 to 30 s, underwent assessment based on equal error rates (EER) and Log cost-likelihood ratios (Cllr). Results highlight the substantial challenge posed by identical twins to the ASR system, leading to a decrease in overall speaker recognition accuracy. Furthermore, analyses based on longer speech samples outperformed those using shorter samples. As sample size increased, standard deviation values for both intra and inter-speaker similarity scores decreased, indicating reduced variability in estimating speaker similarity/dissimilarity levels in longer speech stretches compared to shorter ones. The study also uncovered varying degrees of likeness among identical twins, with certain pairs presenting a greater challenge for ASR systems. These outcomes align with prior research and are discussed within the context of relevant literature.

2.
Front Psychol ; 14: 1101187, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37138997

RESUMO

This study aimed to assess what we refer to as the speaker discriminatory power asymmetry and its forensic implications in comparisons performed in different speaking styles: spontaneous dialogues vs. interviews. We also addressed the impact of data sampling on the speaker's discriminatory performance concerning different acoustic-phonetic estimates. The participants were 20 male speakers, Brazilian Portuguese speakers from the same dialectal area. The speech material consisted of spontaneous telephone conversations between familiar individuals, and interviews conducted between each individual participant and the researcher. Nine acoustic-phonetic parameters were chosen for the comparisons, spanning from temporal and melodic to spectral acoustic-phonetic estimates. Ultimately, an analysis based on the combination of different parameters was also conducted. Two speaker discriminatory metrics were examined: Cost Log-likelihood-ratio (Cllr) and Equal Error Rate (EER) values. A general speaker discriminatory trend was suggested when assessing the parameters individually. Parameters pertaining to the temporal acoustic-phonetic class depicted the weakest performance in terms of speaker contrasting power as evidenced by the relatively higher Cllr and EER values. Moreover, from the set of acoustic parameters assessed, spectral parameters, mainly high formant frequencies, i.e., F3 and F4, were the best performing in terms of speaker discrimination, depicting the lowest EER and Cllr scores. The results appear to suggest a speaker discriminatory power asymmetry concerning parameters from different acoustic-phonetic classes, in which temporal parameters tended to present a lower discriminatory power. The speaking style mismatch also seemed to considerably impact the speaker comparison task, by undermining the overall discriminatory performance. A statistical model based on the combination of different acoustic-phonetic estimates was found to perform best in this case. Finally, data sampling has proven to be of crucial relevance for the reliability of discriminatory power assessment.

3.
Phonetica ; 79(5): 459-512, 2022 10 26.
Artigo em Inglês | MEDLINE | ID: mdl-36420530

RESUMO

Albanian comprises two main dialects, Gheg and Tosk, as well as a Tosk-based standard variety. The study was concerned with the extent to which the vocalic system of Southern Gheg, spoken in the capital city Tirana and surrounding rural area, has been shaped in urban versus rural contexts by extensive contact with Tosk and the standard. Through an apparent-time comparison across two groups of adults and first-grade children, one from Tirana and the other from the nearby village of Bërzhitë, we investigated three vocalic features of Southern Gheg: rounding of /a/, vowel lengthening and monophthongization, all of which were expected to be maintained more in the rural community than in the urban one, and also more by adults than by children. Our results showed that rounding was changing in both locations, monophthongization in the urban setting only, while lengthening was well preserved. In general, the changes found for rounding and monophthongization were more advanced in children than adults. The relative complexity of the features is the main factor explored to account for why some features change faster than others. The reasons for a possible increase in the phonological complexity of Southern Gheg are also discussed.


Assuntos
Idioma , População Rural , Adulto , Criança , Humanos , Linguística
4.
Phonetica ; 79(3): 219-245, 2022 06 27.
Artigo em Inglês | MEDLINE | ID: mdl-35981718

RESUMO

The prosodic structure of under-researched languages in the Trade Malay language family is poorly understood. Although boundary marking has been uncontroversially shown as the major prosodic function in these languages, studies on the use of pitch accents to highlight important words in a phrase remain inconclusive. In addition, most knowledge of pitch accents is based on well-researched languages such as the ones from the Western-Germanic language family. This paper reports two word identification experiments comparing Papuan Malay with the pitch accent language American English, in order to investigate the extent to which the demarcating and highlighting function of prosody can be disentangled. To this end, target words were presented to native listeners of both languages and differed with respect to their position in the phrase (medial or final) and the shape of their f0 movement (original or manipulated). Reaction times for the target word identifications revealed overall faster responses for original and final words compared to manipulated and medial ones. The results add to previous findings on the facilitating effect of pitch accents and further improve our prosodic knowledge of underresearched languages.


Assuntos
Idioma , Percepção da Fala , Humanos , Malásia , Fonética , Tempo de Reação , Acústica da Fala , Estados Unidos
5.
Data Brief ; 42: 108062, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35356315

RESUMO

The data reported in this article are non-coronal fricative measurements from 10 (5 male; 5 female) native speakers of Zhongjiang Chinese. Each speaker produced 10 repetitions of 90 monosyllabic words beginning with either a velar fricative, /x/, or a labial-dental fricative, /f/. The measurements reported include spectral properties often used to characterize fricative variation, including: spectrum center of gravity (CoG), spectrum standard deviation (SD), spectrum skew, spectrum kurtosis, maximum amplitude frequency, and maximum amplitude. These measurements are compared across two data filtering conditions: a high pass filter condition, in which a 300Hz high pass filter was applied to the data before spectral measurements were calculated, and a no filter condition. The 90 monosyllabic words include the target fricatives in different phonetic environments. Target words include some that historically derive from different fricatives and show variation across regional varieties of Mandarin Chinese. Subsets of the target materials enable several closely matched comparisons of items. We describe measurements across the whole dataset, comparing as well the effect that filtering has on the measurements. The data also include a CSV file with measurements of each token, which enables comparison of phonetic contexts, lexical effects and individual differences in fricative variation beyond those described here. For further discussion of the data, please refer to the full length article entitled "The role of gestural timing in non-coronal fricative mergers in Southwestern Mandarin: acoustic evidence from a dialect island. Journal of Phonetics" [6].

6.
J Child Lang ; 48(3): 541-568, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-34726145

RESUMO

Our motivation was to examine how toddler (2;6) and adult speakers of American English prosodically realize information status categories. The aims were three-fold: 1) to analyze how adults phonologically make information status distinctions; 2) to examine how these same categories are signaled in toddlers' spontaneous speech; and 3) to analyze the three primary acoustic correlates of prosody (F0, intensity, and duration). During a spontaneous speech task designed as an interactive game, a set of target nouns was elicited as one of three types (new, given, corrective). Results show that toddlers primarily used H* across information status categories, with secondary preferences for deaccenting given information and for using L+H* for corrective information. Only duration distinguished information status, and duration, average pitch, and intensity differentiated pitch accent types for both adults and children. Discussion includes how pitch accent selection and input play a role in guiding prosodic realizations of information status.


Assuntos
Percepção da Fala , Fala , Adulto , Pré-Escolar , Humanos , Idioma , Fonética , Acústica da Fala , Medida da Produção da Fala
7.
J Voice ; 2021 Oct 07.
Artigo em Inglês | MEDLINE | ID: mdl-34629229

RESUMO

OBJECTIVE: To assess the speaker-discriminatory potential of a set of fundamental frequency estimates in intraidentical twin pair comparisons and cross-pair comparisons (i.e., among all speakers). PARTICIPANTS: A total of 20 Brazilian Portuguese speakers of the same dialect, namely 10 male identical twin pairs aged between 19 and 35, were recruited. METHOD: the participants were recorded directly through professional microphones while taking part in a spontaneous dialogue over mobile phones. Acoustic measurements were performed in connected speech samples, and in lengthened vowels, at least 160 ms long produced during spontaneous speech. RESULTS: f0 baseline, central tendency, and extreme values were found mostly discriminatory in intra-twin pair and cross-pair comparisons. These were also the estimates displaying the largest effect sizes. Overall, only three identical twins were found statistically different regarding their f0 patterns in connected speech, but not for lengthened vowel-based f0 metrics. Estimates of f0 variation and modulation were found the least discriminatory across speakers, which may signal the control of speaking style and dialect on dynamic patterns of f0. Concerning system performance, the base value of f0 (f0 baseline) was found the most reliable metric, displaying the lowest equal error rate (EER). CONCLUSIONS: the outcomes suggest that, although identical twins were very closely related regarding their f0 patterns, some pairs could still be differentiated acoustically, only in connected speech. Such findings reinforce the relevance of analyzing long-term f0 metrics for speaker comparison purposes, with particular consideration to f0 baseline. Furthermore, f0 differences across subjects were suggested as more expressive in connected speech than in lengthened vowels.

8.
Phonetica ; 78(3): 201-240, 2021 06 23.
Artigo em Inglês | MEDLINE | ID: mdl-34162023

RESUMO

The present study examines the relationship between the two grammars of bilingual speakers, the linguistic ecologies in which the L1 and L2 become active, and how these topics can be explored in a bilingual community undergoing L1 attrition. Our experiment focused on the production of intervocalic phonemic voiced stops for L1-Afrikaans/L2-Spanish bilinguals in Patagonia, Argentina. While these phonemes undergo systematic intervocalic lenition in Spanish (e.g., /b d É¡/ > [ß ð É£]), they do not in Afrikaans (e.g., /b d/ > [b d]). The bilingual participants in our study produced target Afrikaans and Spanish words in unilingual and code-switched speaking contexts. The results show that: (i) the participants produce separate phonetic categories in Spanish and Afrikaans; (ii) code-switching affects the production of the target sounds asymmetrically, such that L1 Afrikaans influences the production of L2 Spanish sounds but not vice versa; and (iii) this L1-to-L2 influence remains robust despite the instability of the L1 itself. Altogether, our findings speak to the persistence of a bilingual's L1 phonological grammar despite cross-generational L1 attrition.


Assuntos
Multilinguismo , Voz , Humanos , Idioma , Fonética , Som
9.
Entropy (Basel) ; 22(3)2020 Mar 13.
Artigo em Inglês | MEDLINE | ID: mdl-33286105

RESUMO

Steady-state vowels are vowels that are uttered with a momentarily fixed vocal tract configuration and with steady vibration of the vocal folds. In this steady-state, the vowel waveform appears as a quasi-periodic string of elementary units called pitch periods. Humans perceive this quasi-periodic regularity as a definite pitch. Likewise, so-called pitch-synchronous methods exploit this regularity by using the duration of the pitch periods as a natural time scale for their analysis. In this work, we present a simple pitch-synchronous method using a Bayesian approach for estimating formants that slightly generalizes the basic approach of modeling the pitch periods as a superposition of decaying sinusoids, one for each vowel formant, by explicitly taking into account the additional low-frequency content in the waveform which arises not from formants but rather from the glottal pulse. We model this low-frequency content in the time domain as a polynomial trend function that is added to the decaying sinusoids. The problem then reduces to a rather familiar one in macroeconomics: estimate the cycles (our decaying sinusoids) independently from the trend (our polynomial trend function); in other words, detrend the waveform of steady-state waveforms. We show how to do this efficiently.

10.
Phonetica ; 77(5): 394-404, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32756059

RESUMO

Strange as it may seem, von Kempelen's speaking machine from 1791 is the best result obtained in various attempts to build a mechanism similar to the speech apparatus, capable of producing a vocal signal. In this book discussion, we will illustrate von Kempelen's work, along with the attempts, across history, to build talking devices. We will highlight the 2 paths that have been followed over the centuries: "vocal transport" and "artificial voice." The first case was a trick, because the voice was produced by a hidden subject and transported through an artifice to a head or a statue. The other path, that of research, has tried to imitate the phonatory apparatus to produce sequences of sounds somewhat similar to those that make up the speech chain. Which of the 2 paths led to the production of today's synthesized speech? The trick or the research? We will try to answer this question.

11.
Int. j. odontostomatol. (Print) ; 14(2): 205-212, June 2020. tab, graf
Artigo em Inglês | LILACS | ID: biblio-1090676

RESUMO

Mapudungun is a language used by Mapuche people in some regions of Chile and Argentina. The aim of this study was to describe the vowel phonemes with regard to the articulatory parameters (position of the tongue with respect to the palate and jaw opening) and acoustic parameters (f0, F1, F2 and F3) in Mapudungun speakers in the Region of La Araucanía. The vocalic phonemes of Mapudungun are six, where the first five are similar to those used in Spanish (/a e i o u/), to which is added a sixth vowel (/ɨ/) with its vocalic allophones (/ɨ/) and [Ә]. Three Mapudungun speakers were evaluated. The tongue movements were collected by Electromagnetic Articulography 3D and the data were processed with MATLAB and PRAAT software. It was possible to describe the trajectory of each third of the tongue during the production of the vowels. It was observed that the sixth vowel /Ә/ had minimal jaw opening during its pronunciation. In addition, the characteristic of /Ә/ as an unrounded mid-central vowel was corroborated. In this study, the tongue of mapudungun speakers was in a more posterior position than the found in other studies.


El Mapudungun es un lenguaje utilizado por los mapuches en algunas regiones de Chile y Argentina. El objetivo de este estudio fue describir los fonemas vocálicos respecto a los parámetros articulatorios (posición de la lengua respecto al paladar y apertura mandibular) y los parámetros acústicos (f0, F1, F2 y F3) en hablantes de Mapudungun en la Región de La Araucanía, los fonemas vocálicos de Mapudungun son seis, donde los primeros cinco son similares a los utilizados en español (/a e i o u /), a los que se agrega una sexta vocal (/ɨ/) con sus alófonos vocálicos [ɨ] y [Ә]. Se evaluaron tres hablantes de Mapudungun. Los movimientos de la lengua fueron registrados por Articulografía Electromagnética 3D y los datos fueron procesados con el software MATLAB y PRAAT. Fue posible describir la trayectoria de cada tercio de la lengua durante la producción de las vocales. Se observó que la sexta vocal /Ә/ tenía una apertura mínima de la mandíbula durante su pronunciación. Además, se corroboró la característica de /Ә/ como vocal central media no redondeada. En este estudio, la lengua de los hablantes de mapudungun estaba en una posición más posterior que la encontrada en otros estudios.


Assuntos
Humanos , Masculino , Feminino , Adulto , Medida da Produção da Fala/instrumentação , Língua/fisiologia , Fonética , Índios Sul-Americanos , Arcada Osseodentária/fisiologia , Acústica da Fala , Projetos Piloto , Fenômenos Eletromagnéticos
12.
Elife ; 92020 02 17.
Artigo em Inglês | MEDLINE | ID: mdl-32048990

RESUMO

Khoomei is a unique singing style originating from the republic of Tuva in central Asia. Singers produce two pitches simultaneously: a booming low-frequency rumble alongside a hovering high-pitched whistle-like tone. The biomechanics of this biphonation are not well-understood. Here, we use sound analysis, dynamic magnetic resonance imaging, and vocal tract modeling to demonstrate how biphonation is achieved by modulating vocal tract morphology. Tuvan singers show remarkable control in shaping their vocal tract to narrowly focus the harmonics (or overtones) emanating from their vocal cords. The biphonic sound is a combination of the fundamental pitch and a focused filter state, which is at the higher pitch (1-2 kHz) and formed by merging two formants, thereby greatly enhancing sound-production in a very narrow frequency range. Most importantly, we demonstrate that this biphonation is a phenomenon arising from linear filtering rather than from a nonlinear source.


The republic of Tuva, a remote territory in southern Russia located on the border with Mongolia, is perhaps best known for its vast mountainous geography and the unique cultural practice of "throat singing". These singers simultaneously create two different pitches: a low-pitched drone, along with a hovering whistle above it. This practice has deep cultural roots and has now been shared more broadly via world music performances and the 1999 documentary Genghis Blues. Despite many scientists being fascinated by throat singing, it was unclear precisely how throat singers could create two unique pitches. Singing and speaking in general involves making sounds by vibrating the vocal cords found deep in the throat, and then shaping those sounds with the tongue, teeth and lips as they move up the vocal tract and out of the body. Previous studies using static images taken with magnetic resonance imaging (MRI) suggested how Tuvan singers might produce the two pitches, but a mechanistic understanding of throat singing was far from complete. Now, Bergevin et al. have better pinpointed how throat singers can produce their unique sound. The analysis involved high quality audio recordings of three Tuvan singers and dynamic MRI recordings of the movements of one of those singers. The images showed changes in the singer's vocal tract as they sang inside an MRI scanner, providing key information needed to create a computer model of the process. This approach revealed that Tuvan singers can create two pitches simultaneously by forming precise constrictions in their vocal tract. One key constriction occurs when tip of the tongue nearly touches a ridge on the roof of the mouth, and a second constriction is formed by the base of the tongue. The computer model helped explain that these two constrictions produce the distinctive sounds of throat singing by selectively amplifying a narrow set of high frequency notes that are made by the vocal cords. Together these discoveries show how very small, targeted movements of the tongue can produce distinctive sounds.


Assuntos
Faringe/fisiologia , Canto , Recursos Audiovisuais , Humanos , Imageamento por Ressonância Magnética , Faringe/diagnóstico por imagem , Federação Russa
13.
Elife ; 92020 02 12.
Artigo em Inglês | MEDLINE | ID: mdl-32048994

RESUMO

MRI experiments have revealed how throat singers from Tuva produce their characteristic sound.


Assuntos
Canto , Faringe , Som , Acústica da Fala
14.
Lang Speech ; 61(3): 430-465, 2018 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-29058989

RESUMO

This study investigates the production and auditory lexical processing of words involved in a patterned phonological alternation in two dialects of Catalan spoken on the island of Majorca, Spain. One of these dialects, that of Palma, merges /ɔ/ and /o/ as [o] in unstressed position, and it maintains /u/ as an independent category, [u]. In the dialect of Sóller, a small village, speakers merge unstressed /ɔ/, /o/, and /u/ to [u]. First, a production study asks whether the discrete, rule-based descriptions of the vowel alternations provided in the dialectological literature are able to account adequately for these processes: are mergers complete? Results show that mergers are complete with regards to the main acoustic cue to these vowel contrasts, that is, F1. However, minor differences are maintained for F2 and vowel duration. Second, a lexical decision task using cross-modal priming investigates the strength with which words produced in the phonetic form of the neighboring (versus one's own) dialect activate the listeners' lexical representations during spoken word recognition: are words within and across dialects accessed efficiently? The study finds that listeners from one of these dialects, Sóller, process their own and the neighboring forms equally efficiently, while listeners from the other one, Palma, process their own forms more efficiently than those of the neighboring dialect. This study has implications for our understanding of the role of lifelong linguistic experience on speech performance.


Assuntos
Fonética , Reconhecimento Psicológico , Acústica da Fala , Percepção da Fala , Qualidade da Voz , Estimulação Acústica , Adulto , Humanos , Masculino , Medida da Produção da Fala , Adulto Jovem
15.
Front Psychol ; 5: 1065, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25339921

RESUMO

This paper examines to what extent acoustic similarity between native and non-native vowels predicts non-native vowel perception and whether this process is influenced by listeners' native and other non-native dialects. Listeners with Northern and Southern British English dialects completed a perceptual assimilation task in which they categorized tokens of 15 Dutch vowels in terms of English vowel categories. While the cross-language acoustic similarity of Dutch vowels to English vowels largely predicted Southern listeners' perceptual assimilation patterns, this was not the case for Northern listeners, whose assimilation patterns resembled those of Southern listeners for all but three Dutch vowels. The cross-language acoustic similarity of Dutch vowels to Northern English vowels was re-examined by incorporating Southern English tokens, which resulted in considerable improvements in the predicting power of cross-language acoustic similarity. This suggests that Northern listeners' assimilation of Dutch vowels to English vowels was influenced by knowledge of both native Northern and non-native Southern English vowel categories. The implications of these findings for theories of non-native speech perception are discussed.

16.
J Speech Lang Hear Res ; 56(4): 1260-71, 2013 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-23785190

RESUMO

PURPOSE: Children acquire /-z/ syllabic plurals (e.g., bus es) later than /-s, -z/ segmental plurals (e.g., cat s, dog s). In this study, the authors explored whether increased syllable number or segmental factors best explains poorer performance with syllabic plurals. METHOD: An elicited imitation experiment was conducted with 14 two-year-olds involving 8 familiar disyllabic target plural nouns, half with syllabic plurals (e.g., bus → bus es) and half with segmental plurals (e.g., letter → letter s). Children saw pictures of the target items on a computer and repeated prerecorded 3-word-utterances with the target word in utterance-medial position (e.g., "The buses come") and utterance-final position (e.g., "Hear the buses"). Acoustic analysis determined the presence or absence of the plural morpheme and its duration. RESULTS: Children had more trouble producing syllabic plurals compared with segmental plurals. Errors were especially evident in the utterance-medial position, where there was less time for the child to perceive/produce the word in the absence of phrase-final lengthening and where planning for the following word was still required. CONCLUSIONS: The results suggested that articulatory difficulties-rather than a word length effect-explain later acquisition of syllabic plurals relative to segmental plurals. These findings have implications for the nature of syllabic plural acquisition in children with hearing impairments and specific language impairment.


Assuntos
Transtornos da Articulação/diagnóstico , Linguagem Infantil , Desenvolvimento da Linguagem , Fonética , Acústica da Fala , Fala , Fatores Etários , Pré-Escolar , Feminino , Humanos , Lactente , Transtornos do Desenvolvimento da Linguagem/diagnóstico , Masculino , Semântica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...